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1. INTRODUCTION 

In Indonesia, especially in big cities, vehicle growth rates are very high, this is because people 
prefer to use private vehicles rather than public vehicles. But the high rate of vehicle growth is not matched 
by adequate road construction, so that too many vehicles and inadequate roads cause congestion which is 
detrimental to society. Therefore, road construction or widening is needed in order to reduce congestion [1]. 
Construction or widening of the highway to reduce congestion requires careful planning and must be in 
accordance with needs. To find out the location of congestion points, it requires data on the number of 
vehicles passing through that location. More vehicles passing by means that road construction or widening is 
needed at the location to reduce congestion. On every highway there are various types of passing vehicles 
such as cars, trucks, buses, and motorbikes. Data of each type of vehicle that passes can be used for 
consideration of development plans or road widening. To count the number of vehicles that pass each day can 
automatically use the application by implementing a smart system [2]. 

As research on artificial intelligence develops, one of them is object detection, which can help 
recognize objects in an image. Object detection is one area in computer vision. Computer vision is the study 
of how computers see and analyze objects in an image. Object detection is useful for detecting or recognizing 
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objects in an image based on the shape, color or from the dataset created. There are various ways to 
implement object detection applications, including using the convolutional neural network (CNN) method 
and you only look once (YOLO) detection system. In research conducted by Redmon et al. [3], the YOLO 
detection system was proven to recognize an object more quickly in the image so that it is very suitable when 
applied to real-time object detection on video [4]-[7]. 

In real-time object detection the speed of object detection is very important because it is different 
from an image, in a video can process 24 frames per second (fps) or more. If the object detection process 
takes too long then the resulting video will not be good, there will be a delay of each frame so that the video 
becomes broken. By applying object detection using YOLO to an application can help the calculation and 
classification of each vehicle that passes on the highway in real-time via CCTV video. Passing vehicles will 
be counted and classified automatically based on their detection results and accuracy level [5]. 

Based on the description above the application of an intelligent system that is object detection using 
YOLO can be a solution to classify and count every vehicle that passes on the highway in real-time every 
day. Vehicle calculation and classification data obtained can be used for consideration of determining road 
construction or widening [8]. 


2. RESEARCH METHOD 

We actualize this model as a convolutional neural system and assess it on the identification dataset. 
The underlying convolutional layers of the system extricate highlight from the picture while the completely 
associated layers foresee the yield probabilities and directions. We likewise train a quick form of YOLO 
intended to push the limits of quick item location. Quick YOLO utilizes a neural system with less 
convolutional layers (9 rather than 24) and fewer channels in those layers. Other than the size of the system, 
all preparation and testing parameters are the equivalent among YOLO and Fast YOLO. The last yield of our 
system is the 7x 7x30 tensor [3], [9]. 

We reframe object identification as a solitary relapse issue, directly from picture pixels to jumping 
box arranges and class probabilities. Utilizing our framework, you only look once (YOLO) at a picture to 
foresee what articles are available and where they are. YOLO prepares on full pictures and straightforwardly 
enhances location execution. This bound-together model has a few advantages over conventional strategies 
for object identification [10]-[14]. 

We outline location as a relapse issue we need not bother with an unpredictable pipeline. We 
essentially run our neural system on another picture at test time to foresee identifications. Our base system 
runs at 45 casings for each second with no group handling and a quick form runs at in excess of 150 fps. This 
implies we can process spilling video continuously with under 25 milliseconds of dormancy. Besides, YOLO 
accomplishes more than double the mean normal accuracy of other constant frameworks. 

Second, YOLO reasons all around about the picture when making expectations. Not at all like 
sliding window and locale proposition based procedures, YOLO sees the whole picture during preparing and 
test time so it encodes relevant data about classes just as their appearance. Quick CNN, a top location 
technique. Botches foundation fixes in a picture for objects since it cannot see the bigger setting. YOLO 
makes not exactly a large portion of the number of foundation mistakes contrasted with Fast CNN. 

Third, YOLO learns generalizable portrayals of articles. At the point when prepared on normal 
pictures and tried on the work of art, YOLO outflanks top identification strategies like CNN by a wide edge. 
Since YOLO is exceptionally generalizable it is more averse to separate when applied to new areas or 
startling information. 

In the implementation of this smart system the author uses 4 classes (vehicle types) to detect the 
type of vehicle, namely cars, trucks, buses, and motorcycles where the training process uses pretrained 
weight from Yolo and then runs in GPU mode with 200 epochs. During the training process will produce file 
weights that will be used to predict objects [15]-[17]. 


3. RESULTS AND DISCUSSION 

That has been collected as many as 600 images are labeled in accordance with the name of the 
object in it, labeling the image using the label IMMG software, then the results of labeling the image are 
saved with Pascal VOC format. To run the program, select the source video to be used, then draw the 
calculation area marked with a green line, then select the weights from the file generated from the training 
process and the configuration file. If the configuration is correct press the start button, the program will run 
the video and start classifying and counting the number of vehicles. Vehicles that are detected and pass 
through the calculation area will be marked with a red bounding box [18]. 


Int J Artif Intell, Vol. 10, No. 3, September 2021: 571 - 575 


Int J Artif Intell ISSN: 2252-8938 O 573 


In Table 1, the results of the study using a CNN, with a video size of 640x480, were able to detect 
the presence of a car vehicle object by 88% for its success rate, while in a video with a size of 640x360, car 
vehicle objects were able to be detected by 83%. Likewise, for detecting truck vehicle objects, CNN works 
quite well with a 640x480 video where the results obtained are 100% successful in detecting a truck vehicle 
object compared to a 640x360 video which is only able to detect the success of the truck object by 90% [19], 
[20]. This is because the quality of the objects obtained must be of good quality to be detected well enough 
[21]. 


Table 1. Result automating vehicle detection 


Video Duration Size Actual Detection results Accuracy 
CCTV Perintis Kemerdekaan 1 minute 9 seconds 640x480 Cars: 18 Cars: 16 Cars: 88% 
Road, Banyumanik, Semarang Trucks: 8 Trucks: 8 Trucks: 100% 
Bus: 0 Bus: 0 Bus: 0% 
Motor: 23 Motor: 22 Motor: 95% 
Sample CCTV toll road 6 minutes 24 seconds 640x360 Cars: 159 Car: 132 Car: 83% 
Trucks: 20 Truck: 18 Truck: 90% 
Bus: 2 Bus: 2 Bus: 100% 
Motor: 3 Motor: 3 Motor: 100% 


Figure | is a desktop application that is used to get automatic counting of vehicles passing on a road 
that has been installed a CCTV camera. Figure 2 shows the catch of a vehicle object that can be detected 
from an application that has been made using a CNN algorithm. 


Figure 2. Bounding the box 


YOLO commits far less foundation errors than Fast CNN. By utilizing YOLO to dispense with 
foundation detections from fast CNN we get a critical lift in execution. For each bouncing box that CNN 
predicts we verify whether YOLO predicts a comparable box. In the event that it does, we give that forecast a 
lift dependent on the likelihood anticipated by YOLO [3], [22]-[25]. 


The implementation of intelligent systems in automating vehicle detection on the road (Susanto ) 


574 0 ISSN: 2252-8938 


4. 


CONCLUSION 
We present YOLO and CNN, a bound-together model for object detection. Our model is easy to 


build and can be prepared straightforwardly on full pictures. Not at all like classifier-based methodologies, 
YOLO is prepared on a misfortune work that straightforwardly relates to identification. Execution and the 
whole model are prepared mutually. Quick YOLO is the quickest broadly useful article detector in the 
writing and YOLO pushes the best in class continuously object recognition. Range of values confidence 
generated is 40% to 90% depending on the point of view and clarity of the object. The accuracy value when 
testing gets a value between 80% to 95%. 
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